Can Symbol Grounding Improve Low-Level NLP? Word Segmentation as a Case Study

نویسندگان

  • Hirotaka Kameko
  • Shinsuke Mori
  • Yoshimasa Tsuruoka
چکیده

We propose a novel framework for improving a word segmenter using information acquired from symbol grounding. We generate a term dictionary in three steps: generating a pseudo-stochastically segmented corpus, building a symbol grounding model to enumerate word candidates, and filtering them according to the grounding scores. We applied our method to game records of Japanese chess with commentaries. The experimental results show that the accuracy of a word segmenter can be improved by incorporating the generated dictionary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-Based Approaches to Semantic Interpretation in NLP

into empirical, corpus-based learning approaches to natural language processing (NLP). Most empirical NLP work to date has focused on relatively low-level language processing such as part-ofspeech tagging, text segmentation, and syntactic parsing. The success of these approaches has stimulated research in using empirical learning techniques in other facets of NLP, including semantic analysis—un...

متن کامل

Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two s...

متن کامل

Dynamic Symbol Grounding, State Construction and the Problem of Teleology

Symbol grounding has originated within the connectionist-symbolic debate so as to gap the bridge between the two approaches. This paper provides an overview about recent results concerning symbol grounding , which is critically reviewed here. A thorough analysis reveals that symbol grounding parallels transcen-dental logic and is best viewed as automated model construction. If this diagnosis is...

متن کامل

Machine Symbol Grounding and Optimization

Autonomous systems gather high-dimensional sensorimotor data with their multimodal sensors. Symbol grounding is about whether these systems can, based on this data, construct symbols that serve as a vehicle for higher symbol-oriented cognitive processes. Machine learning and data mining techniques are geared towards finding structures and input-output relations in this data by providing appropr...

متن کامل

Weakly supervised learning of allomorphy

Most NLP resources that offer annotations at the word segment level provide morphological annotation that includes features indicating tense, aspect, modality, gender, case, and other inflectional information. Such information is rarely aligned to the relevant parts of the words—i.e. the allomorphs, as such annotation would be very costly. These unaligned weak labelings are commonly provided by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015